The Use of yig-cha and chos-kyi-rnam-grangs in Computing Lexical Cohesion for Tibetan Topic Boundary Detection

نویسنده

  • Paul G. Hackett
چکیده

To properly implement a simple Tibetan Information Retrieval (IR) system segmentation of one form or another (n-gram, POS-tagging, dictionary substring matching, etc.) must be performed (see Hackett (2000b)). To take Tibetan indexing to a more sophisticated level however, some form of topic detection must be employed. This paper reports the results of a pilot study on the application to Tibetan of one technique for topic boundary detection: Lexical Cohesion. The resources developed and deployed, the theoretical model used, and its potential applications are discussed. Introduction In a previous paper (Hackett, 2000b) we demonstrated a method for performing wordsegmentation in conjunction with part-of-speech tagging and sentence boundary detection. While sufficient for simple indexing and IR purposes, the assessment of larger scale structures within a text allows for more precise searching, translation equivalent disambiguation based on domain identification, and additional tagging possibilities. This paper reports the result of research deploying a method used by Kozima (1993) — “lexical cohesion” — for topic boundary detection, modified for Tibetan. Given the lack of comparable lexical resources for less-commonly studied languages like Tibetan, we exploit certain features in classical Tibetan literature, namely the literary genres of monastic textbooks (yig cha) and lists of enumerated phenomena (chos kyi rnam grangs), to build a keyword correlation database for use in computing “Lexical Cohesion Profiles” (LCP) for Tibetan texts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Cohesion in English and Persian Abstracts

This study compares and contrasts lexical cohesion in English and Persian abstracts of Iranian medical students’ theses to appreciate textualization processes in the two languages. For this purpose, one hundred English and Persian abstracts were selected randomly and analyzed based on Seddigh and Yarmohamadi’s (1996) lexical cohesion framework, a version of Halliday and Hasan’s (1976) and Halli...

متن کامل

Lexical Cohesion and Literariness in Malcolm X's " The Ballot or the Bullet"

This paper unearths the contribution of lexical cohesion to the textuality and overall meaning of Malcolm X’s speech 'The Ballot or the Bullet'. Drawing on Halliday and Hasan’s (1976) and Hoey’s (1991) theory of cohesion, specifically lexical   cohesion, whose main thrust is the role of lexical items in not only contributing to meaning but also serving as cohesive ties, the paper discusses how ...

متن کامل

Speech cohesion for topic segmentation of spoken contents

In this paper, we introduce the notion of speech cohesion for topic segmentation of a spoken content. The aim is to integrate speaker information and lexical information within a single cohesion value. Based on a lexical cohesion system, we propose an approach that directly integrates the speaker distribution when processing the cohesion. A potential boundary is effective if the joint distribut...

متن کامل

The Relationship between Rhetorical moves and Lexical Cohesion Patterns; the case of Introduction and Discussion sections of Local and International Research Articles

Communicative moves and lexical cohesion patterns (LCPs), as mounting evidence shows, are two important indicators in writing and publishing the RAs. However, the interaction between these two crucial elements and the contribution of this interaction to the failure or success of the RAs have not been given due attention to date. Having this in mind and based on a sound theoretical framework, at...

متن کامل

Lexical Cohesion Based Topic Modeling for Summarization

In this paper, we attack the problem of forming extracts for text summarization. Forming extracts involves selecting the most representative and significant sentences from the text. Our method takes advantage of the lexical cohesion structure in the text in order to evaluate significance of sentences. Lexical chains have been used in summarization research to analyze the lexical cohesion struct...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010